Skip to content

Conversation

@douggallup
Copy link

Changes the method of opening datasets from using quest_io_plugins, to using the intake library (https://github.com/ContinuumIO/intake). Requires the intake-xarray plugin (https://github.com/ContinuumIO/intake-xarray) for reading rasters, and the intake_questhdf5 plugin (https://github.com/Aquaveo/intake_questhdf5) for reading hdf5 files (both XY and timeseries).

The local metadata db is changed so that there is an 'intake_plugin' and 'intake_args' fields for the dataset, which gives the appropriate plugin (ex: 'rasterio' from intake-xarray, 'quest_xyHdf5' or 'quest_timeseries_hdf5' from intake-questhdf5), along with the necessary arguments needed to use that plugin (ex: path, chunks, etc). The intake registry will give a list of plugins installed.

This only replaces opening a dataset, and does nothing for the visualization or output capability of the quest io plugins.

from .metadata import get_metadata
from .tasks import add_async
from ..util import to_geojson
from ..static import UriType, PluginType

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F401 '..static.UriType' imported but unused

from ..database import get_db, db_session
from ..static import DatasetStatus
from ..util import logger as log
from ..static import DatasetStatus, UriType

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F401 '..static.UriType' imported but unused


from ...static import DatasetStatus, DatasetSource
from ...util import listify, format_json_options, uuid
from ...static import DatasetStatus, DatasetSource, UriType

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F401 '...static.UriType' imported but unused

from io import StringIO
from geojson import Feature, FeatureCollection

from ..static import UriType

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F401 '..static.UriType' imported but unused

import numpy as np

from quest.plugins import IoBase
from quest.static import DataType

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F401 'quest.static.DataType' imported but unused


from quest import util
from quest.plugins import ToolBase
from quest.static import DataType, UriType

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F401 'quest.static.UriType' imported but unused

import param
from quest import util
from quest.plugins import ToolBase
from quest.static import DataType, UriType

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F401 'quest.static.UriType' imported but unused

from quest.plugins import ToolBase
from quest.api import get_metadata, update_metadata
from quest.plugins import load_plugins
from quest.static import UriType, DataType

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F401 'quest.static.UriType' imported but unused


from quest.plugins import ToolBase
from quest.api import get_metadata, update_metadata
from quest.static import DataType, UriType

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F401 'quest.static.UriType' imported but unused

from quest.plugins import ToolBase
from quest import util
from quest.plugins import ToolBase
from quest.static import DataType, UriType, GeomType

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F401 'quest.static.UriType' imported but unused

@martindurant
Copy link

Can I help at all in this process?

@sdc50
Copy link
Member

sdc50 commented Mar 5, 2019

Can I help at all in this process?

@martindurant thanks for offering to help with this. I am working to release a few bug fixes before integrating this PR in, so I have not yet had time to review it. I would be interested in you reviewing what @douggallup has done with using Intake in Quest and offer any suggestions you may have.

Copy link

@martindurant martindurant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thoughts from my point of view.
Note that I am probably missing a lot of the context from the Quest side, since I am not familiar with its internals.

m = get_metadata(dataset).get(dataset)
file_format = m.get('file_format')
path = m.get('file_path')
intake_plugin = m.get('intake_plugin')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the class that loads data for intake is now usually called a "driver" - there are many types of plugins. https://intake.readthedocs.io/en/latest/glossary.html

# Use intake plugin to open
if intake_plugin:
# New code, with 'intake_plugin' added to the local .db
plugin_name = 'open_' + intake_plugin

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems somewhat fragile.
You could look directly in the registry

import intake
cls = intake.registry[intake_plugin]
source = cls(*args, **kwargs)

or, perhaps better, you could construct either the relavant YAML block or a intake.catalog.local.LocalCatalogEntry instance, and have Intake do the lookup for you.

Note that in the Intake world, the driver here could be something like "parquet", but it can also be the fully-qualified class name like "intake_parquet.ParquetSource". Of course, if you have additional constrains within Quest, that's fine.

file_path = orm.Optional(str, nullable=True)
visualization_path = orm.Optional(str)
intake_plugin = orm.Optional(str)
intake_args = orm.Optional(str)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A JSON representation of arguments, correct? If these are stored as strings, would it make sense to use the same YAML spec used by Intake text-file catalogs?

'elevation': 'elevation'
}

def download(self, catalog_id, file_path, dataset, **kwargs):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know the context here, but you should be aware that Intake also has the ability to download source data files on first use https://intake.readthedocs.io/en/latest/catalog.html#caching-source-files-locally

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants